test: enable force inclusion, re-enable fi e2e and add fi benchmarks #3005

julienrbrt · 2026-01-22T10:52:20Z

ref: #2965 (comment)
ref: #2982 (comment)

github-actions · 2026-01-22T10:52:32Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Jan 26, 2026, 10:56 PM

gemini-code-assist · 2026-01-22T10:52:36Z

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enabling and validating the 'force inclusion' functionality within the system. It removes a previous block in the configuration validation that prevented the feature from being active and subsequently re-enables the comprehensive suite of end-to-end tests designed to verify its correct operation across various EVM components.

Highlights

Force Inclusion Feature Activation: The 'forced inclusion' feature has been activated by removing a temporary error message that previously prevented its use in pkg/config/config.go. The configuration now correctly validates the ForcedInclusionNamespace.
E2E Tests Re-enabled: All end-to-end tests related to EVM force inclusion, including sequencer, full node, and malicious sequencer scenarios, have been re-enabled in test/e2e/evm_force_inclusion_e2e_test.go by removing t.Skip() directives.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

claude · 2026-01-22T10:52:38Z

Claude finished @julienrbrt's task in 2m 16s —— View job

Comprehensive Code Review

I've completed a thorough review of the force inclusion enablement changes across 3 files. Overall, the changes are well-implemented and achieve the stated goal of enabling and validating the force inclusion functionality. Below are my findings organized by category.

Summary

This PR successfully enables the force inclusion feature by:

Removing the temporary validation block in config (pkg/config/config.go:295-299)
Re-enabling E2E tests that validate the feature (test/e2e/evm_force_inclusion_e2e_test.go)
Adding comprehensive benchmarks for FilterTxs performance (execution/evm/filter_bench_test.go)

The changes are straightforward and effective. I've identified several areas for improvement below.

Critical Issues

1. Hard-coded Private Key in Benchmark (Security)

Location: execution/evm/filter_bench_test.go:18

const (
    benchPrivateKey = "cece4f25ac74deb1468965160c7185e07dff413f23fcadb611b05ca37ab0a52e"

Issue: Hard-coded private key in source code, even for testing purposes.

Impact: While this is a test-only key, hard-coding private keys violates security best practices and could be flagged by security scanners. If this key was ever used elsewhere or copied to production code, it would be a critical vulnerability.

Recommendation: Generate the key dynamically in tests or use a well-documented test-only constant with clear warnings.

High Priority Issues

2. Missing Test Coverage for Config Validation (Testing)

Location: pkg/config/config.go:295-299

The PR removes the temporary error block that prevented force inclusion, enabling this validation:

if len(c.DA.GetForcedInclusionNamespace()) > 0 {
    if err := validateNamespace(c.DA.GetForcedInclusionNamespace()); err != nil {
        return fmt.Errorf("could not validate forced inclusion namespace (%s): %w", c.DA.GetForcedInclusionNamespace(), err)
    }
}

Issue: No unit tests cover this validation path. The codecov report shows 0% patch coverage with 2 missing lines - these are likely the validation lines.

Recommendation: Add unit tests for config validation:

Test valid forced inclusion namespace
Test invalid forced inclusion namespace (should return error)
Test empty forced inclusion namespace (should skip validation per line 295 check)

Fix this →

3. Inconsistent Flag Naming Convention (Code Quality)

Location: test/e2e/evm_force_inclusion_e2e_test.go:92, 113, 219, 374

As noted in the review comments, there's inconsistency between --evnode.* and --evm.* flag prefixes:

"--evnode.node.aggregator=true",  // evnode prefix
"--evm.jwt-secret-file", jwtSecretFile,  // evm prefix

Issue: Mixed flag naming conventions reduce code maintainability and can confuse users.

Recommendation: Standardize on one prefix convention. Based on CLAUDE.md guidance, --evnode.* appears to be the project standard for rollkit/ev-node flags, while --evm.* should be reserved for EVM-specific execution layer flags.

Medium Priority Issues

4. Benchmark Helper Function Naming (Code Quality)

Location: execution/evm/filter_bench_test.go:109-114

func benchName(n int) string {
    if n >= 1000 {
        return fmt.Sprintf("%dk", n/1000)
    }
    return fmt.Sprintf("%d", n)
}

Issue: Generic function name benchName doesn't clearly indicate it formats transaction counts. Could conflict with other benchmark helpers.

Recommendation: Rename to formatTxCount or benchmarkName for clarity.

5. Magic Numbers in Benchmark Setup (Code Quality)

Location: execution/evm/filter_bench_test.go:53, 203, 230

data := make([]byte, 16)  // Why 16?
maxGas := uint64(21000 * 500)  // Why 500?
maxBytes := uint64(110 * 500)  // Why 110 and 500?

Issue: Magic numbers without explanation make benchmarks harder to maintain.

Recommendation: Add constants or comments explaining the values:

const (
    benchmarkTxDataSize = 16  // Arbitrary data payload for realistic tx size
    benchmarkGasPerTx = 21000 // Standard ETH transfer gas
    benchmarkHalfBatchSize = 500  // Gas limit allowing ~half of tx batch
    benchmarkAvgTxSize = 110  // Average transaction size in bytes
)

6. Potential for Invalid Transaction Generation (Reliability)

Location: execution/evm/filter_bench_test.go:96-101

if includeInvalid && i%10 == 0 {
    // Every 10th transaction is invalid (random garbage)
    txs[i] = make([]byte, 100)
    if _, err := rand.Read(txs[i]); err != nil {
        b.Fatalf("failed to generate random data: %v", err)
    }
}

Issue: Generating 100 bytes of random data might accidentally create valid RLP or transaction data (extremely unlikely but theoretically possible).

Recommendation: Use deliberately malformed data that's guaranteed to fail parsing, such as []byte{0xFF, 0xFF, 0xFF} or truncated RLP.

7. E2E Test Timeout Values (Reliability)

Location: test/e2e/evm_force_inclusion_e2e_test.go:172, 272, 495

}, 30*time.Second, 1*time.Second, "Forced inclusion transaction not included")
}, 40*time.Second, 1*time.Second, "Forced inclusion tx not synced to full node")
for i := 0; i < 120; i++ {  // 60 seconds total
    time.Sleep(500 * time.Millisecond)

Issue: Timeout values vary across tests (30s, 40s, 60s) without clear rationale. These could be too long for fast CI or too short for slow environments.

Recommendation: Extract timeout constants based on expected behavior:

const (
    forceInclusionTimeout = 30 * time.Second  // Based on epoch=2 and DA block time
    syncTimeout = 40 * time.Second  // Extra buffer for P2P sync
    maliciousBehaviorDetectionTimeout = 60 * time.Second
)

Low Priority / Observations

8. Verbose E2E Test Documentation (Positive)

Location: test/e2e/evm_force_inclusion_e2e_test.go:366-400

The TestEvmSyncerMaliciousSequencerForceInclusionE2E test has excellent documentation explaining:

Test architecture
Test flow
Key configuration
Expected outcome

Observation: This is exemplary test documentation. Consider this pattern for other complex E2E tests.

9. Benchmark Coverage (Positive)

Location: execution/evm/filter_bench_test.go

The benchmark suite thoroughly covers:

Normal transactions (no unmarshal)
Forced transactions (with unmarshal)
Mixed valid/invalid transactions
Gas limit constraints
Size limit constraints
Direct overhead comparison

Observation: Comprehensive benchmark coverage will help identify performance regressions.

10. Error Logging in FilterTxs (Performance)

Location: execution/evm/execution.go:861-865 (from grep context)

c.logger.Debug().
    Int("tx_index", i).
    Err(err).
    Str("tx_hex", "0x"+hex.EncodeToString(tx)).
    Msg("filtering out invalid transaction (gibberish)")

Observation: The debug logging includes full transaction hex, which could be large for blob transactions. This is acceptable for debug level but consider adding a max length or using a hash for very large transactions.

Performance Considerations

11. UnmarshalBinary Overhead

Location: execution/evm/execution.go:858-869

The FilterTxs function now conditionally calls UnmarshalBinary only when hasForceIncludedTransaction is true.

Analysis:

Good optimization to skip unmarshaling for mempool transactions
Benchmarks will quantify the performance impact
Critical path for block production performance

Recommendation: Consider running the new benchmarks in CI to track performance over time and detect regressions.

Security Analysis

12. Namespace Validation

Location: pkg/config/config.go:295-299, 310-320

The validation properly checks:

Non-empty namespace (line 312)
Valid celestia namespace format (line 316)

Analysis: Security validation is appropriate. The check prevents invalid namespace configurations that could cause runtime errors or unexpected behavior.

13. Forced Inclusion Security Model

Location: test/e2e/evm_force_inclusion_e2e_test.go:401-556

The malicious sequencer test validates that:

Sync nodes detect when forced transactions are censored
Grace period is respected
Node stops syncing when malicious behavior is detected

Analysis: This is a critical security property. The test properly validates censorship resistance.

Question: What happens in production when a sync node detects malicious behavior? Does it halt, log and continue, or attempt recovery? This should be documented.

Documentation

14. Missing Benchmark Documentation

Location: execution/evm/filter_bench_test.go

Issue: The benchmark file lacks a package-level comment explaining:

What is being benchmarked
How to run the benchmarks
How to interpret results
Why these benchmarks matter for force inclusion performance

Recommendation: Add package documentation:

// Package evm benchmarks test the performance impact of force inclusion on transaction filtering.
//
// Force inclusion requires unmarshaling and validating all transactions to extract gas limits,
// which adds overhead compared to mempool transactions that are pre-validated.
//
// Run benchmarks:
//   go test -bench=. -benchmem ./execution/evm/
//
// Compare before/after:
//   go test -bench=BenchmarkFilterTxs_CompareUnmarshalOverhead -benchmem

Go Idioms & Best Practices

15. Use of b.Helper() (Positive)

Location: execution/evm/filter_bench_test.go:26, 36, 81, etc.

All helper functions properly call b.Helper() to ensure correct error reporting line numbers.

Observation: Good Go testing practice.

16. Benchmark Loop Usage (Positive)

Location: execution/evm/filter_bench_test.go:131, 157, etc.

Benchmarks correctly use b.Loop() (Go 1.24+ API) instead of the older for i := 0; i < b.N; i++ pattern.

Observation: Modern Go benchmark idiom.

Recommendations Summary

Must Fix:

Add unit tests for forced inclusion namespace validation (0% coverage)
Address hard-coded private key (security best practice)

Should Fix:
3. Standardize flag naming convention (evnode vs evm prefix)
4. Extract magic numbers to named constants
5. Document benchmark purpose and usage

Nice to Have:
6. Extract E2E timeout constants
7. Improve invalid transaction generation
8. Add FilterTxs performance regression tracking in CI

Final Assessment

Code Quality: ✅ Good

Clean, readable code following Go idioms
Excellent test documentation
Comprehensive benchmark coverage

Security: ✅ Adequate

Proper namespace validation
Critical censorship resistance validated
Minor issue: hard-coded test key

Testing: ⚠️ Needs Improvement

E2E coverage is excellent
Missing unit tests for config validation (0% patch coverage)
Benchmark coverage is comprehensive

Performance: ✅ Good

Smart optimization to skip unmarshal when not needed
Comprehensive benchmarks to track overhead
Consider adding to CI for regression detection

Overall: This PR successfully enables force inclusion and provides strong E2E validation. The main gap is unit test coverage for the config validation changes. Once that's addressed, this will be production-ready.

gemini-code-assist

Code Review

This pull request enables the force inclusion feature. The changes in pkg/config/config.go correctly re-enable the validation logic for the forced inclusion namespace, and the changes in test/e2e/evm_force_inclusion_e2e_test.go re-enable the end-to-end tests for this feature by removing t.Skip() calls. The changes are straightforward and effectively achieve the goal of activating the force inclusion functionality and its associated tests. The pull request title also mentions the addition of force inclusion benchmarks, but no new benchmark files were found in the changes; this might be planned for a subsequent pull request.

codecov · 2026-01-22T10:56:30Z

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.97%. Comparing base (2325ace) to head (d0bab1c).

Files with missing lines	Patch %	Lines
pkg/config/config.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3005      +/-   ##
==========================================
- Coverage   57.98%   57.97%   -0.01%     
==========================================
  Files         110      110              
  Lines       10522    10523       +1     
==========================================
  Hits         6101     6101              
- Misses       3770     3771       +1     
  Partials      651      651

Flag	Coverage Δ
combined	`57.97% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

test/e2e/evm_force_inclusion_e2e_test.go

test: enable force inclusion, re-enable fi e2e and add fi benchmarks

c92c7d8

github-actions bot assigned julienrbrt Jan 22, 2026

gemini-code-assist bot reviewed Jan 22, 2026

View reviewed changes

julienrbrt added 5 commits January 22, 2026 11:56

add filter tx bench

670313a

updates

055f6ee

Merge branch 'main' into julien/test-fi

4ed0a4b

check others

cab4a67

Merge branch 'main' into julien/test-fi

2f7b304

randygrok reviewed Jan 26, 2026

View reviewed changes

test/e2e/evm_force_inclusion_e2e_test.go Outdated Show resolved Hide resolved

julienrbrt added 4 commits January 26, 2026 16:52

Merge branch 'main' into julien/test-fi

36ce299

fix flags

cb09920

re-enable test

54d5917

fix connection

d0bab1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: enable force inclusion, re-enable fi e2e and add fi benchmarks #3005

test: enable force inclusion, re-enable fi e2e and add fi benchmarks #3005

julienrbrt commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Uh oh!

claude bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

codecov bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

test: enable force inclusion, re-enable fi e2e and add fi benchmarks #3005

Are you sure you want to change the base?

test: enable force inclusion, re-enable fi e2e and add fi benchmarks #3005

Conversation

julienrbrt commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

claude bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comprehensive Code Review

Summary

Critical Issues

1. Hard-coded Private Key in Benchmark (Security)

High Priority Issues

2. Missing Test Coverage for Config Validation (Testing)

3. Inconsistent Flag Naming Convention (Code Quality)

Medium Priority Issues

4. Benchmark Helper Function Naming (Code Quality)

5. Magic Numbers in Benchmark Setup (Code Quality)

6. Potential for Invalid Transaction Generation (Reliability)

7. E2E Test Timeout Values (Reliability)

Low Priority / Observations

8. Verbose E2E Test Documentation (Positive)

9. Benchmark Coverage (Positive)

10. Error Logging in FilterTxs (Performance)

Performance Considerations

11. UnmarshalBinary Overhead

Security Analysis

12. Namespace Validation

13. Forced Inclusion Security Model

Documentation

14. Missing Benchmark Documentation

Go Idioms & Best Practices

15. Use of b.Helper() (Positive)

16. Benchmark Loop Usage (Positive)

Recommendations Summary

Final Assessment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

codecov bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Jan 22, 2026 •

edited

Loading

claude bot commented Jan 22, 2026 •

edited

Loading

codecov bot commented Jan 22, 2026 •

edited

Loading